Acoustic measures vs. phonetic features as predictors of audible discontinuity in concatenative speech synthesis

نویسندگان

  • Hisashi Kawai
  • Minoru Tsuzaki
چکیده

Most concatenative speech synthesizers employ both acoustic measures and phonetic features to predict the perceptual damage caused by concatenating two waveform segments because no reliable acoustic measure has been found so far. This paper compares the predicting ability of the two kinds of predictor variables. We first conduct a perceptual experiment to measure the naturalness degradation due to signal discontinuity introduced by concatenating waveform segments. Secondly, we predict the score of naturalness degradation from acoustic measures derived from MFCC and/or phonetic features using statistical models such as a multiple regression model. Based on an investigation of the multiple regression coefficients, we found that (1) the phonetic features are more effective and that (2) the acoustic measures do not provide useful information in addition to the phonetic features.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Data-driven perceptually based join costs

Concatenative speech synthesis systems attempt to minimize audible discontinuities between two successive concatenated units. In unit selection concatenative synthesis, a join cost is calculated that is intended to predict the extent of audible discontinuity introduced by the concatenation of two specific units. A study was conducted that used human perceptual data on the detectability of mid-v...

متن کامل

Phonetic effects on listener detection of vowel concatenation

Concatenative speech synthesis quality depends in part on the minimization of audible discontinuities between two successive concatenated units. This study focuses on human detection of concatenation discontinuities in synthetic speech. Statistical analyses compared for various phonetic categories the results observed in perceptual tests with two voices – one female and one male. Neither a comp...

متن کامل

Objective distance measures for spectral discontinuities in concatenative speech synthesis

The quality of unit selection based concatenative speech synthesis mainly depends on how well two successive units can be joined together to minimise the audible discontinuities. The objective measure of discontinuity used when selecting units is known as the join cost. The ideal join cost will measure perceived discontinuity, based on easily measurable spectral properties of the units being jo...

متن کامل

Discontinuity Removal in Concatenative Synthesized Speech

Concatenative synthesis concatenates segments of prerecorded natural human speech. It requires database of previously recorded human speech covering all the possible segments to be synthesised. Segment might be phoneme, syllable, word, phrase, or any combination. Concatenative speech synthesis is currently the most practical method for the generation of realistic speech. There mainly two types ...

متن کامل

Feature extraction for unit selection in concatenative speech synthesis: comparison between AIM, LPC, and MFCC

A comprehensive computational model of the human auditory peripherals (AIM) was applied to extract basic features of speech sounds aiming at optimal unit selection in concatenative speech synthesis. The performance of AIM was compared to that of a purely physical model (LPC) as well as that of an approximate auditory model (MFCC) by basic perceptual experiments. While a significant advantage of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002